A Morphological Lexicon of Esperanto with Morpheme Frequencies

نویسنده

  • Eckhard Bick
چکیده

This paper discusses the internal structure of complex Esperanto words (CWs). Using a morphological analyzer, possible affixation and compounding is checked for over 50,000 Esperanto lexemes against a list of 17,000 root words. Morpheme boundaries in the resulting analyses were then checked manually, creating a CW dictionary of 28,000 words, representing 56.4% of the lexicon, or 19.4% of corpus tokens. The error percentage of the EspGram morphological analyzer for new corpus CWs was 4.3% for types and 6.4% for tokens, with a recall of almost 100%, and wrong/spurious boundaries being more common than missing ones. For pedagogical purposes a morpheme frequency dictionary was constructed for a 16 million word corpus, confirming the importance of agglutinative derivational morphemes in the Esperanto lexicon. Finally, as a means to reduce the morphological ambiguity of CWs, we provide POS likelihoods for Esperanto suffixes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bound Morpheme Frequencies in the Performance of Iranian English Language Undergraduates and English Language Materials Developers in Written Descriptive Tasks

This mini-corpus, cross-linguistic, comparative, and norm-referenced study intends to render the most frequently and oft-used affixes in the written descriptive tasks in the performance of English language materials developers (ELMDs) and Iranian English language undergraduates (IELUs). Samples of writings of both groups were studied and analyzed through affixation principles. The frequency of ...

متن کامل

Morphology-Aware Spell-Checking Dictionary for Esperanto

The article describes the process of constructing a spell checker for the Esperanto language and its implementation as a dictionary (i.e. an affix file and a word list) for the Hunspell spell-checking engine. In comparison to existing solutions, the chosen approach takes note of morphologically complex words, which are common in Esperanto due to its agglutinative nature, and applies a set of ru...

متن کامل

Modeling Cross-morpheme Pro for Korean Large Vocabulary Cont

In this paper, we describe a cross-morpheme pronunciation variation model which is especially useful for constructing morpheme-based pronunciation lexicon for Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation var...

متن کامل

Pronunciation lexicon modeling and design for Korean large vocabulary continuous speech recognition

In this paper, we describe a pronunciation lexicon model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. For modeling of cross-morpheme pronunciation variations, we usually used a context-dependent multiple pronunciatio...

متن کامل

Invented Antonyms: Esperanto as a Semantic Lab∗

This paper uses Esperanto—a constructed language with transparent morphology but rich semantic-pragmatic components—to study antonymy and polarity. We investigate the distribution of the Esperanto antonymy morpheme ‘mal-’ (as in, for instance, ‘mal-alta’: antonym-tall, short) in a 4.3 million-word corpus, Tekstaro, and use it as an empirical basis to assess different theories of negative antony...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016